Skip to contents

Introduction

Across many countries—especially in low- and middle-income settings—governments are increasingly seeking to adopt analytics-driven approaches to public sector human resource (HR) management and civil service reform. Yet most government offices and our World Bank regional counterparts often lack standardized tools for transforming raw HRMIS data into formats suitable for rigorous analysis. As a result, each new diagnostic requires bespoke data cleaning, ad hoc coding decisions, and country-specific scripts that are costly to maintain and difficult to replicate.

This fragmentation makes HRM analytics expensive, slow, and inconsistent. It also limits the ability of governments and development partners—including the World Bank—to compare results across time, sectors, or countries, or to build reusable analytics pipelines.

This article describes the standard approach for harmonizing human resource management information system (HRMIS) data, organized into three main modules: Organization, Worker, and Contract. The contents of each module are described in vignette("standard_dictionary"). In this article, we provide a set of helper functions to support the harmonization process across the modules. When users prepare their HRMIS data according to this standard, teams can immediately apply automated tools for data quality control as well as dynamic and reproducible reports on workforce structure dynamics, pay and compensation analytics, establishment control monitoring, staffing distribution analysis.

This vignette, harmonization.Rmd, presents the practical workflow for transforming raw HRMIS extracts into these standardized modules. It provides governments with a clear, repeatable procedure for preparing their data so that advanced analytics can be deployed rapidly and consistently—reducing costs, increasing comparability, and strengthening evidence-based HR decision making.

The harmonization workflow demonstrated here shows how to:

  • map raw administrative HR variables into a standardized public sector schema,

  • recode local classifications (education, occupation, contract type) into cross-country taxonomies,

  • generate unique and persistent identifiers for workers, organizations, and contracts,

  • attach missing metadata such as country codes, administrative hierarchies, or reference dates,

  • structure the cleaned data into the three standard HRMIS modules, and

  • validate the outputs against the HRMIS Standard Data Dictionary.

By following this procedure, governments and World Bank teams can unlock a common analytics ecosystem in which standardized public sector HR data feeds directly into automated dashboards, diagnostic tools, and monitoring systems—enabling faster, cheaper, and more consistent evidence-based HR reform.

The Raw Data

For this vignette, we apply synthetic data assumed from Brazil HRMIS system provided at the contract level. See the data below:

reactable(
  head(bra_hrmis, 1e3)
)

We illustrate the harmonization workflow using this dataset. bra_hrmis contains 344920 rows and 34 columns, covering covering demographic, employment, organizational, and payroll information for public sector workers.

A quick glimpse of the dataset:

glimpse(bra_hrmis)
## Rows: 344,920
## Columns: 34
## $ ANO_PAGAMENTO                     <chr> "2014", "2014", "2014", "2014", "201…
## $ MES_REFERENCIA                    <chr> "9", "9", "9", "9", "9", "9", "9", "…
## $ MATRICULA                         <chr> "1", "22", "30", "51", "81", "86", "…
## $ CPF                               <chr> "9678b179d65c7d9a40d1eb2f0c687529762…
## $ DATA_NASCIMENTO                   <chr> "15311", "19703", "22122", "24943", 
## $ GENERO                            <chr> "MASCULINO", "FEMININO", "FEMININO",
## $ ESCOLARIDADE                      <chr> "SEGUNDO GRAU COMPLETO", "5 A 8 SERI…
## $ DATA_ADMISSAO                     <chr> "28976", "30167", "29921", "32149", 
## $ ADMINISTRACAO                     <chr> "INDIRETA", "DIRETA", "DIRETA", "DIR…
## $ TIPO_CONTRATO                     <chr> "TEMPORÁRIO", "EFETIVO COMISSIONADO"…
## $ GRUPO                             <chr> "OUTROS", "OUTROS", "OUTROS", "OUTRO…
## $ COD_ORGAO                         <chr> "405502", "301101", "301101", "30110…
## $ ORGAO                             <chr> "GABINETE CIVIL", "GABINETE CIVIL", 
## $ CARREIRA                          <chr> "ENGENHEIRO", "AUXILIAR DE SERVICOS …
## $ CARGO                             <chr> "ENGENHEIRO", "AUXILIAR DE SERVICOS …
## $ JORNADA                           <chr> "30", "40", "40", "30", "40", "40", 
## $ CLASSE                            <chr> NA, "C", "D", "B", "B", "B", "C", "D…
## $ NIVEL                             <chr> "SERV816", "ACENC40", "ACSND40", "AC…
## $ DATA_ULT_PROGRESSAO               <chr> "41671", NA, "40179", "39638", NA, N
## $ SALARIO_BASE                      <chr> "13191.07", "705.73", "3841.43", "21…
## $ CONTRIBUICAO_PREVIDENCIA          <chr> "482.92", "0", "0", "0", "0", "0", "…
## $ ADICIONAL_TEMPO_SERVICO           <chr> "1978.66", "0", "0", "0", "0", "0", 
## $ COMISSAO                          <chr> "0", "442.87", "316.33999999999997",
## $ ABONO_PERMANENCIA                 <chr> "0", "79.64", "422.56", "0", "0", "0…
## $ DECISAO_JUDICIAL                  <chr> "0", "0", "0", "0", "0", "0", "0", "…
## $ DEMAIS_GRATIFICACOES_TRANSITORIAS <chr> "0", "237.05", "0", "0", "24.1", "24…
## $ DEMAIS_GRATIFICACOES_CARREIRA     <chr> "0", "0", "0", "0", "0", "0", "0", "…
## $ SALARIO_BRUTO                     <chr> "15169.73", "1465.29", "4580.33", "2…
## $ SALARIO_LIQUIDO                   <chr> "11474.09", "1465.29", "4369.1000000…
## $ DATA_APOSENTADORIA                <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, 
## $ VALOR_BRUTO                       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, 
## $ VALOR_LIQUIDO                     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, 
## $ TIPO                              <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, 
## $ `TEMPO DE CONTRIBUIÇÃO`           <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, 

This will give you a brief, surface level overview of the dataset. You should see the dimensions of the data (i.e. number of rows and columns) and variable types and perhaps begin to get a sense for completeness (or the lack thereof) for certain variables.

The Harmonization Process

Introduction

The goal of the harmonization process is to establish a consistent and standardized structure for cleaned HR datasets, ensuring that analytics supporting government human resources management from a common analytical foundation. In many client countries, HRM analytics are conducted in an ad-hoc, highly customized manner that varies by consultant, ministry, year, and even dataset. This leads to high analytical costs, limited comparability over time or across institutions, and significant barriers to scaling evidence-based human resource reform.

The harmonization framework addresses these challenges directly. By defining a clear data dictionary and a set of standardized transformations, governments can prepare their HRMIS data in a format that is analytically ready and fully compatible with the tools provided in the govhr package. This ensures:

  • Reproducible and transparent workflows
  • Cross-country and cross-institution comparability
  • Automatic generation of quality checks and dashboards
  • Reduced data preparation cost for each new engagement

The remainder of this section documents the harmonization steps for each of the three core modules derived from the data dictionary:

  1. Contract Module
  2. Worker Module
  3. Organization Module Each module represents a specific level of analysis defined by the dictionary and is to be constructed directly from the raw HRMIS data.

Harmonizing the Contract Module

The Contract Module is the foundational component of the harmonized payroll dataset. The purpose of this module is to extract and structure all information that is unique at a contract, reference date level, as defined in the data dictionary. A “contract record” refers to a worker–contract relationship at a specific reference period (usually month–year), capturing the contractual characteristics governing the worker’s employment at that time.

In practice, the raw HRMIS data typically mixes variables at different conceptual levels—worker attributes, position attributes, contract terms, payroll events, and sometimes even one-off administrative transactions.

Working with payroll or HRMIS data in many developing-country contexts often involves very large datasets—sometimes millions of records spanning multiple years. Such data volumes can quickly overwhelm standard analytical workflows if not handled using efficient tools.

In R, there are several paradigms for manipulating two-dimensional data. For the purposes of this tutorial, we rely on the data.table package for the tasks that are more computationally intensive. At the time of writing, data.table remains the gold standard for high-performance data processing in R, offering exceptional speed, efficient memory use, and an expressive syntax tailored for large datasets.

Although the dplyr package from the tidyverse ecosystem is widely used and favored for its readability and intuitive grammar, operations on large payroll datasets can be substantially slower when using dplyr alone. To ensure reproducibility, performance, and scalability—especially for teams with limited computational resources—we adopt data.table as the primary engine for harmonization. However, we will also use the dplyr functions where performance is not improved to ensure code readability.

The bra_hrmis data is written in Portuguese which might require translation to English to ease understanding. One option is to use an LLM like ChatGPT to get a quick as there are only 34 columns. Alternatively the polyglotr R package provides a suite functions that support translation. This can be applied as follows:

raw_dictionary <- 
  tibble(raw_colnames_pt = colnames(bra_hrmis),
         raw_colnames_eng = polyglotr::google_translate(colnames(bra_hrmis), "pt", "en"))

kable(raw_dictionary)
raw_colnames_pt raw_colnames_eng
ANO_PAGAMENTO ANO_PAGAMENTO
MES_REFERENCIA MES_REFERENCIA
MATRICULA MATRÍCULA
CPF CPF
DATA_NASCIMENTO DATA_NASCIMENTO
GENERO GÊNERO
ESCOLARIDADE ESCOLARIDADE
DATA_ADMISSAO DATA_ADMISSÃO
ADMINISTRACAO ADMINISTRAÇÃO
TIPO_CONTRATO TIPO_CONTRATO
GRUPO GRUPO
COD_ORGAO COD_ORGAO
ORGAO ORGAO
CARREIRA CARREIRA
CARGO CARGA
JORNADA JORNADA
CLASSE CLASSE
NIVEL NÍVEL
DATA_ULT_PROGRESSAO DATA_ULT_PROGRESSÃO
SALARIO_BASE SALARIO_BASE
CONTRIBUICAO_PREVIDENCIA CONTRIBUICAO_PREVIDÊNCIA
ADICIONAL_TEMPO_SERVICO ADICIONAL_TEMPO_SERVICO
COMISSAO COMISSÃO
ABONO_PERMANENCIA ABONO_PERMANENCIA
DECISAO_JUDICIAL DECISAO_JUDICIAL
DEMAIS_GRATIFICACOES_TRANSITORIAS DEMAIS_GRATIFICACOES_TRANSITORIAS
DEMAIS_GRATIFICACOES_CARREIRA DEMAIS_GRATIFICACOES_CARREIRA
SALARIO_BRUTO SALARIO_BRUTO
SALARIO_LIQUIDO SALARIO_LIQUIDO
DATA_APOSENTADORIA DATA_APOSENTADORIA
VALOR_BRUTO VALOR_BRUTO
VALOR_LIQUIDO VALOR_LIQUIDO
TIPO TIPO
TEMPO DE CONTRIBUIÇÃO TEMPO DE CONTRIBUIÇÃO

The first step when handling any raw payroll dataset is to find the individual to identifies each countract for each time period unique. We begin by looking through the bra_hrmis object for this. From the above table and the previous glimpse(), we can see that the CPF and MATRICULA variables are the most likely candidates to identify contracts uniquely. We do a quick check for this as follows:

### first let us convert the data to a data.table object to speed up our computations

bra_hrmis <- as.data.table(bra_hrmis)

# Unique CPF per year
cpf_summary <- bra_hrmis[, .(unique_cpf = uniqueN(CPF),
                             nobs       = .N),
                         by = ANO_PAGAMENTO]

# Unique Matricula per year
mtr_summary <- bra_hrmis[, .(unique_mtr = uniqueN(MATRICULA),
                             nobs       = .N), 
                         by = ANO_PAGAMENTO]

kable(cpf_summary)
ANO_PAGAMENTO unique_cpf nobs
2014 57694 60291
2015 59886 62614
2016 70677 74310
2017 69732 73287
2018 70911 74418
kable(mtr_summary)
ANO_PAGAMENTO unique_mtr nobs
2014 60291 60291
2015 62614 62614
2016 74310 74310
2017 73287 73287
2018 74418 74418

This clearly shows that MATRICULA is the contract ID while CPF is possibly the identifier for the worker, the latter will come in handy during the harmonization of the worker module.

Now, we can begin creating the set of variables as defined by the standard_dictionary vignette. It is often useful to begin by creating the derived variables. There are four sets of derived variables within the dictionary:

    1. the Industrial Standard Classification of Occupation (ISCO) variables, i.e. occupation_isconame, occupation_iscocode are derived from the original occupation variables occupation_native, occupation_english. We apply the polyglotr::google_translate() function to convert Portuguese named occupations to English as well as use the LabourR::classify_occupation() function to classify these occupations to the level-4 isco names.
#-----------------------------
# 2. Build occupation table (active + inactive)
#-----------------------------

occup_df <- bra_hrmis[, c("CARREIRA", "CARGO")]

# distinct CARREIRA/CARGO/status
occup_df <- unique(occup_df)

# Add translated vars
occup_df[, occupation_native := tolower(CARREIRA)]

occup_df[, occupation_english := tolower(google_translate(text = CARREIRA,
                                                          source_language = "pt",
                                                          target_language = "en"))]

#-----------------------------
# 3. Classify occupations
#-----------------------------
class_occup_df <- copy(occup_df)[,
  .(id = .I, text = occupation_english)
]

class_occup_df <- classify_occupation(class_occup_df,
                                      isco_level = 4,
                                      lang       = "en", 
                                      num_leaves = 1)


datatable(head(class_occup_df, 
               n = nrow(class_occup_df)), 
          options = list(pageLength = 10)) 
#-----------------------------
# 4. Merge classification back into occup_df
#-----------------------------
occup_df[, id := .I]

# merge iscoGroup
occup_df <- merge(
  occup_df,
  class_occup_df[, .(id = as.integer(id), iscoGroup)],
  by = "id",
  all.x = TRUE
)

setnames(occup_df, "iscoGroup", "occupation_iscocode")

# merge ISCO descriptions
occup_df <- merge(occup_df,
                  isco[, c("unit", "description")] |> as.data.table(),
                  by.x = "occupation_iscocode",
                  by.y = "unit",
                  all.x = TRUE)

setnames(occup_df, "description", "occupation_isconame")

#-----------------------------
# 5. Bring classified occupations back to bra_hrmis (active only)
#-----------------------------

## we perform a left join with data.table syntax for speed as the merging into 
## the hrmis dataset could be computationally intensive

bra_hrmis <- occup_df[bra_hrmis, on = c("CARREIRA", "CARGO")]
    1. the date variables (ref_date, start_date, end_date) are often in the serial 5-digit format and need to be standardized as they will be used in preparing the compensation variables which are time-variant. Below, we create the date variables:
### lets include the dates
bra_hrmis[, ref_date := as.Date(paste(ANO_PAGAMENTO, MES_REFERENCIA, "01", sep = "-"))]

setnames(bra_hrmis, c("DATA_ADMISSAO", "DATA_APOSENTADORIA"), c("start_date", "end_date"))
bra_hrmis[, start_date := as.Date(as.integer(start_date), origin = "1899-12-30")]
bra_hrmis[, end_date := as.Date(as.integer(end_date), origin = "1899-12-30")]


### lets include the country code and admin identifier

bra_hrmis[, country_code := "BRA"]
bra_hrmis[, country_name := "Brazil"]
bra_hrmis[, adm1_name := "Alagoas"]
bra_hrmis[, adm1_code := "AL"]
    1. the compensation variables i.e. gross_salary_ppp, base_salary_ppp, net_salary_ppp which are all derived from the local currency equivalents gross_salary_lcu, base_salary_lcu, net_salary_lcu. We provide a function called convert_constant_ppp() as well as a macro_indicators dataset with this package to perform these transformations between the nominal variables (i.e. _lcu) and their real equivalents (_ppp).
### lets convert the nominal compensation variables to real (_ppp) values
# Step 1: Create the _lcu salary variables (in-place, no copy)

### lets rename the raw variables here to their actual names
setnames(bra_hrmis, 
         old = c("SALARIO_BASE", "SALARIO_BRUTO", "SALARIO_LIQUIDO", "ABONO_PERMANENCIA"),
         new = c("base_salary_lcu", "gross_salary_lcu", "net_salary_lcu", "allowance_lcu"))


# Step 2: Identify all *_lcu columns to convert
cols_to_convert <- grep("_lcu$", names(bra_hrmis), value = TRUE)

# Step 3: Apply PPP conversion using your convert_constant_ppp function
pfw_df <- 
  macro_indicators |>
  dplyr::filter(country_code == "BRA", 
                year %in% as.integer(unique(bra_hrmis$ANO_PAGAMENTO))) |>
  dplyr::select(all_of(c("country_code", "year", "cpi", "ppp")))


bra_hrmis[, (cols_to_convert) := lapply(.SD, as.numeric), .SDcols = cols_to_convert]


### apply the convert_constant_ppp function to produce those estimates
bra_hrmis <- convert_constant_ppp(data = bra_hrmis[, year := lubridate::year(ref_date)],
                                  cols = cols_to_convert,
                                  macro_indicators = macro_indicators)

### while we are at this lets include the working hour variable as well
bra_hrmis[, whours := as.numeric(JORNADA)]
    1. the contract type variable TIPO_CONTRATO needs to be reclassified according to the dictionary, contract_type. We do so as follows:
### create a little dictionary mapping all the raw classes into a contract type in a data.table
contract_dict <- 
  data.table(TIPO_CONTRATO = unique(bra_hrmis$TIPO_CONTRATO),
             contract_type = c("short-term", "permanent", "permanent", "open-term",
                               "short-term", "inactive", "retired"))

kable(contract_dict)
TIPO_CONTRATO contract_type
TEMPORÁRIO short-term
EFETIVO COMISSIONADO permanent
EFETIVO permanent
EXCLUSIVAMENTE COMISSIONADO open-term
TEMPOR¡RIO short-term
INATIVO inactive
PENSIONISTA retired

Let’s do a data.table join of the contract_dict into the bra_hrmis

bra_hrmis <- contract_dict[bra_hrmis, on = "TIPO_CONTRATO"]

Now that the derived variables in this contract module have been created, we are ready to create the rest of the variables all at once to finalize the module.

setnames(bra_hrmis,
         old = c("MATRICULA", "CPF", "ORGAO", "CLASSE", "NIVEL"),
         new = c("contract_id", "worker_id", "org_id", "paygrade", "seniority"))


### now let us select the final set of dictionary variables to complete the module
### lets use dplyr::select() function since this will not add any computational time
bra_hrmis_contract <- 
  bra_hrmis |>
  dplyr::select(
    contract_id, worker_id, org_id,
    ends_with("_date"),
    contains("_salary_"),
    contract_type, 
    starts_with("occupation_"),
    country_code, country_name,
    adm1_name, adm1_code,
    whours, paygrade, seniority
  )

Here is what our final data looks like:

## Rows: 344,920
## Columns: 24
## $ contract_id         <chr> "1", "22", "30", "51", "81", "86", "102", "103", "…
## $ worker_id           <chr> "9678b179d65c7d9a40d1eb2f0c687529762fe73ee1e48e768…
## $ org_id              <chr> "GABINETE CIVIL", "GABINETE CIVIL", "SECRETARIA DE…
## $ start_date          <date> 1979-05-01, 1982-08-04, 1981-12-01, 1988-01-07, 1…
## $ end_date            <date> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N
## $ ref_date            <date> 2014-09-01, 2014-09-01, 2014-09-01, 2014-09-01, 2…
## $ base_salary_lcu     <dbl> 13191.07, 705.73, 3841.43, 2100.44, 699.90, 699.90…
## $ gross_salary_lcu    <dbl> 15169.73, 1465.29, 4580.33, 2100.44, 1166.87, 1166…
## $ net_salary_lcu      <dbl> 11474.09, 1465.29, 4369.10, 2082.87, 1166.87, 1166…
## $ base_salary_ppp     <dbl> 3652.4100, 195.4061, 1063.6345, 581.5804, 193.7918…
## $ gross_salary_ppp    <dbl> 4200.2714, 405.7169, 1268.2249, 581.5804, 323.0889…
## $ net_salary_ppp      <dbl> 3177.0039, 405.7169, 1209.7385, 576.7156, 323.0889…
## $ contract_type       <chr> "short-term", "permanent", "permanent", "permanent…
## $ occupation_iscocode <chr> "2149", "9311", "3119", "9311", NA, "9311", "2422"…
## $ occupation_native   <chr> "engenheiro", "auxiliar de servicos dive", "tecnic…
## $ occupation_english  <chr> "engineer", "dive services assistant", "planning t…
## $ occupation_isconame <chr> "Engineering Professionals Not Elsewhere Classifie…
## $ country_code        <chr> "BRA", "BRA", "BRA", "BRA", "BRA", "BRA", "BRA", "…
## $ country_name        <chr> "Brazil", "Brazil", "Brazil", "Brazil", "Brazil", 
## $ adm1_name           <chr> "Alagoas", "Alagoas", "Alagoas", "Alagoas", "Alago…
## $ adm1_code           <chr> "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "A…
## $ whours              <dbl> 30, 40, 40, 30, 40, 40, 30, 40, 30, 40, 30, 30, 40…
## $ paygrade            <chr> NA, "C", "D", "B", "B", "B", "C", "D", "C", "D", "…
## $ seniority           <chr> "SERV816", "ACENC40", "ACSND40", "ACMNB30", "ACENB…

Harmonizing the Worker Module

The Worker Module standardizes individual-level information about employees with the public sector. This section demonstrates the harmonization pipeline using the same contract-level Brazil (Alagoas) HRMIS dataset (bra_hrmis) as an example. We will produce a clean, harmonized dataset named bra_hrmis_worker which conforms to the Worker Module dictionary in standard_dictionary.Rmd.

Let’s start with a little exploration. We need to understand how many unique worker-organization-refdate combinations are within the data:

## how many observations should we expect to have
worker_count <- 
bra_hrmis |>
  dplyr::select(worker_id, org_id, ref_date) |>
  uniqueN()

This tells us that we have 334248 unique worker_id-org_id-refdate combinations in the entire dataset. The goal is to ensure that once all the other variables of the worker module are added. The size of the worker module remains the same. As we did, in the contract module, we now add all the derived variables:

## lets relabel some more variables


# Create the education harmonization dictionary for educat7
education_dictionary <- data.table(
  ESCOLARIDADE = c(
    "ANALFABETO",
    "1 A 4 SERIE DO PRIM. GRAU INCOMPLETO",
    "5 A 8 SERIE DO PRIM. GRAU INCOMPLETO",
    "1 A 4 SERIE DO PRIM. GRAU COMPLETO",
    "5 A 8 SERIE DO PRIM. GRAU COMPLETO",
    "SEGUNDO GRAU INCOMPLETO",
    "SEGUNDO GRAU COMPLETO",
    "ESPECIALIZAÇÃO COMPLETO",
    "ESPECIALIZAÇÃO INCOMPLETO",
    "ESPECIALIZA«√O COMPLETO",
    "ESPECIALIZA«√O INCOMPLETO",
    "CURSO SUPERIOR COMPLETO",
    "CURSO SUPERIOR INCOMPLETO",
    "MESTRADO INCOMPLETO",
    NA_character_
  ),
  educat7 = c(
    "No education",
    "Primary incomplete",
    "Primary incomplete",
    "Primary complete",
    "Primary complete",
    "Secondary incomplete",
    "Secondary complete",
    "Higher than secondary but not university",
    "Higher than secondary but not university",
    "Higher than secondary but not university",
    "Higher than secondary but not university",
    "University incomplete or complete",
    "University incomplete or complete",
    "University incomplete or complete",
    NA_character_
  )
)

# Merge the dictionary into bra_hrmis
bra_hrmis <- education_dictionary[bra_hrmis, on = "ESCOLARIDADE"]


bra_hrmis <- 
bra_hrmis |>
  setnames(old = c("DATA_NASCIMENTO", "GENERO"),
           new = c("birth_date", "gender"))

### lets prepare the birth date variables
bra_hrmis[, birth_date := as.Date(as.integer(birth_date), origin = "1899-12-30")]
## Warning in as.Date(as.integer(birth_date), origin = "1899-12-30"): NAs
## introduced by coercion

The remainder of the variables (tribe, race``) appear to be missing from thebra_hrmisraw data. Therefore, we create them as missing variables (NA`).

bra_hrmis[, c("tribe", "race") := .(NA, NA)] ## lets quickly create the variables that are missing from the raw data

Finally, we can create the worker module by taking the set of unique values across the set of worker modules we have now created within bra_hrmis. See the implementation below:

bra_hrmis_worker <- 
  bra_hrmis |>
  dplyr::select(worker_id, org_id, ref_date, birth_date, gender, educat7,
                country_name, country_code, adm1_name, adm1_code) |>
  unique()

Let’s take a look at the bra_hrmis_worker

glimpse(bra_hrmis_worker)
## Rows: 338,842
## Columns: 10
## $ worker_id    <chr> "9678b179d65c7d9a40d1eb2f0c687529762fe73ee1e48e768ca2e3c0…
## $ org_id       <chr> "GABINETE CIVIL", "GABINETE CIVIL", "SECRETARIA DE ESTADO…
## $ ref_date     <date> 2014-09-01, 2014-09-01, 2014-09-01, 2014-09-01, 2014-09-…
## $ birth_date   <date> 1941-12-01, 1953-12-10, 1960-07-25, 1968-04-15, 1966-10-…
## $ gender       <chr> "MASCULINO", "FEMININO", "FEMININO", "FEMININO", "MASCULI…
## $ educat7      <chr> "Secondary complete", "Primary incomplete", "Higher than …
## $ country_name <chr> "Brazil", "Brazil", "Brazil", "Brazil", "Brazil", "Brazil…
## $ country_code <chr> "BRA", "BRA", "BRA", "BRA", "BRA", "BRA", "BRA", "BRA", "…
## $ adm1_name    <chr> "Alagoas", "Alagoas", "Alagoas", "Alagoas", "Alagoas", "A…
## $ adm1_code    <chr> "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL…

Harmonizing the Organization Module

The Organization Module extracts, standardizes, and structures information on public-sector organizations from the HRMIS records, bra_hrmis. The steps below convert the raw organization identifiers into a canonical, well-structured organization register according the harmonization dictionary. See below:

### get the set of variables according to the dictionary
bra_hrmis_org <- 
  bra_hrmis |>
  mutate(org_name_native = org_id,
         org_type = NA,
         org_parent = NA,
         org_child = NA) |>
  dplyr::select(org_id, org_name_native, ref_date, org_type, org_parent, org_child,
                country_code, country_name, adm1_name, adm1_code) |>
  unique() |>
  mutate(org_name_en = polyglotr::google_translate(org_name_native, target_language = "en"))


glimpse(bra_hrmis_org)
## Rows: 256
## Columns: 11
## $ org_id          <chr> "GABINETE CIVIL", "SECRETARIA DE ESTADO DA ARTICULACAO…
## $ org_name_native <chr> "GABINETE CIVIL", "SECRETARIA DE ESTADO DA ARTICULACAO…
## $ ref_date        <date> 2014-09-01, 2014-09-01, 2014-09-01, 2014-09-01, 2014-…
## $ org_type        <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
## $ org_parent      <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
## $ org_child       <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
## $ country_code    <chr> "BRA", "BRA", "BRA", "BRA", "BRA", "BRA", "BRA", "BRA"…
## $ country_name    <chr> "Brazil", "Brazil", "Brazil", "Brazil", "Brazil", "Bra…
## $ adm1_name       <chr> "Alagoas", "Alagoas", "Alagoas", "Alagoas", "Alagoas",
## $ adm1_code       <chr> "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", 
## $ org_name_en     <chr> "CIVIL OFFICE", "STATE SECRETARIAT FOR SOCIAL ARTICULA…

Quality Check for Harmonized Data

Quality Control for the Contract Module

The function qualitycheck_contractmod() performs a comprehensive set of checks on a harmonized HRMIS contract table. It ensures that all required variables exist, validates their data types, verifies uniqueness conditions, and conducts salary outlier analysis using interquartile range (IQR) thresholds.

Key Checks Performed

  • Column existence: Ensures all required fields such as contract_id, worker_id, org_id, salary fields, date fields, and administrative codes are present.

  • Uniqueness: Observations must be unique at the contract_id–org_date level.

  • Type validation: Ensures character, date, and numeric variables are correctly typed.

  • Salary Outlier Detection: For base_salary_lcu, gross_salary_lcu, and net_salary_lcu, the function computes an IQR-based lower and upper bound and checks that all values fall within expected ranges.

  • Logical constraints: whours must be between 0 and 60. Country codes must follow the 3-letter ISO3 format.

  • ISCO validation: Checks that occupation codes and occupation names exist in the official ISCO classification list.

Please see the check below:

qualitycheck_contractmod(bra_hrmis_contract |> as_tibble())
Pointblank Validation
QCheck for Contract Module
tibbleWARN 1 STOP NOTIFY
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W S N EXT

1
col_exists

All required columns are present

col_exists()

&marker;contract_id

1 1
1.00
0
0.00

2
col_exists

All required columns are present

col_exists()

&marker;worker_id

1 1
1.00
0
0.00

3
col_exists

All required columns are present

col_exists()

&marker;org_id

1 1
1.00
0
0.00

4
col_exists

All required columns are present

col_exists()

&marker;org_date

1 1
1.00
0
0.00

5
col_exists

All required columns are present

col_exists()

&marker;base_salary_lcu

1 1
1.00
0
0.00

6
col_exists

All required columns are present

col_exists()

&marker;gross_salary_lcu

1 1
1.00
0
0.00

7
col_exists

All required columns are present

col_exists()

&marker;net_salary_lcu

1 1
1.00
0
0.00

8
col_exists

All required columns are present

col_exists()

&marker;whours

1 1
1.00
0
0.00

9
col_exists

All required columns are present

col_exists()

&marker;country_code

1 1
1.00
0
0.00

10
col_exists

All required columns are present

col_exists()

&marker;start_date

1 1
1.00
0
0.00

11
col_exists

All required columns are present

col_exists()

&marker;end_date

1 1
1.00
0
0.00

12
col_exists

All required columns are present

col_exists()

&marker;occupation_native

1 1
1.00
0
0.00

13
col_exists

All required columns are present

col_exists()

&marker;occupation_english

1 1
1.00
0
0.00

14
col_exists

All required columns are present

col_exists()

&marker;year

1 1
1.00
0
0.00

15
col_exists

All required columns are present

col_exists()

&marker;occupation_iscocode

1 1
1.00
0
0.00

16
col_exists

All required columns are present

col_exists()

&marker;occupation_isconame

1 1
1.00
0
0.00

17
col_exists

All required columns are present

col_exists()

&marker;country_name

1 1
1.00
0
0.00

18
col_exists

All required columns are present

col_exists()

&marker;adm1_name

1 1
1.00
0
0.00

19
col_exists

All required columns are present

col_exists()

&marker;adm1_code

1 1
1.00
0
0.00

20
col_exists

All required columns are present

col_exists()

&marker;paygrade

1 1
1.00
0
0.00

21
col_exists

All required columns are present

col_exists()

&marker;seniority

1 1
1.00
0
0.00

22
rows_distinct

Unique at the contract-year level

rows_distinct()

&marker;contract_id, &marker;org_date

345K 14K
0.04
331K
0.96

23
col_is_character

Character variables are the correct type

col_is_character()

&marker;contract_id

1 1
1.00
0
0.00

24
col_is_character

Character variables are the correct type

col_is_character()

&marker;worker_id

1 1
1.00
0
0.00

25
col_is_character

Character variables are the correct type

col_is_character()

&marker;org_id

1 1
1.00
0
0.00

26
col_is_character

Character variables are the correct type

col_is_character()

&marker;country_code

1 1
1.00
0
0.00

27
col_is_character

Character variables are the correct type

col_is_character()

&marker;occupation_isconame

1 1
1.00
0
0.00

28
col_is_character

Character variables are the correct type

col_is_character()

&marker;occupation_iscocode

1 1
1.00
0
0.00

29
col_is_character

Character variables are the correct type

col_is_character()

&marker;occupation_native

1 1
1.00
0
0.00

30
col_is_character

Character variables are the correct type

col_is_character()

&marker;occupation_english

1 1
1.00
0
0.00

31
col_is_character

Character variables are the correct type

col_is_character()

&marker;country_name

1 1
1.00
0
0.00

32
col_is_date

Date variables are the appropriate type

col_is_date()

&marker;org_date

1 0
0.00
1
1.00

33
col_is_date

Date variables are the appropriate type

col_is_date()

&marker;start_date

1 1
1.00
0
0.00

34
col_is_date

Date variables are the appropriate type

col_is_date()

&marker;end_date

1 1
1.00
0
0.00

35
col_is_numeric

Numeric variables are the right class

col_is_numeric()

&marker;base_salary_lcu

1 1
1.00
0
0.00

36
col_is_numeric

Numeric variables are the right class

col_is_numeric()

&marker;gross_salary_lcu

1 1
1.00
0
0.00

37
col_is_numeric

Numeric variables are the right class

col_is_numeric()

&marker;net_salary_lcu

1 1
1.00
0
0.00

38
col_is_numeric

Numeric variables are the right class

col_is_numeric()

&marker;whours

1 1
1.00
0
0.00

39
col_vals_between

Base salary is within the expected range

col_vals_between()

&marker;base_salary_lcu

[−2,705.275, 7,794.845]

345K 168K
0.49
177K
0.51

40
col_vals_between

Gross salary is within the expected range

col_vals_between()

&marker;gross_salary_lcu

[−2,915.145, 8,667.855]

345K 164K
0.48
180K
0.52

41
col_vals_between

Net salary is within the expected range

col_vals_between()

&marker;net_salary_lcu

[−2,411.99, 7,606.89]

345K 166K
0.48
179K
0.52

42
col_vals_lte

Hours worked is less than 60

col_vals_lte()

&marker;whours

60

345K 216K
0.63
129K
0.37

43
col_vals_gte

Hours worked is greater than 0

col_vals_gte()

&marker;whours

0

345K 216K
0.63
129K
0.37

44
col_vals_regex

Country Code is the 3 letters

col_vals_regex()

&marker;country_code

^[A-Z]{3}&dollar;

345K 345K
1.00
0
0.00

45
col_vals_in_set

All `occupation_isconame` values are valid

col_vals_in_set()

&marker;occupation_isconame

Legislators, Senior Government Officials, Traditional Chiefs and Heads of Villages, Senior Officials of Special-interest Organizations, Managing Directors and Chief Executives, Finance Managers, Human Resource Managers, Policy and Planning Managers, Business Services and Administration Managers Not Elsewhere Classified, Sales and Marketing Managers, Advertising and Public Relations Managers, Research and Development Managers, Agricultural and Forestry Production Managers, Aquaculture and Fisheries Production Managers, Manufacturing Managers, Mining Managers, Construction Managers, Supply, Distribution and Related Managers, Information and Communications Technology Services Managers, Child Care Services Managers, Health Services Managers, Aged Care Services Managers, Social Welfare Managers, Education Managers, Financial and Insurance Services Branch Managers, Professional Services Managers Not Elsewhere Classified, Hotel Managers, Restaurant Managers, Retail and Wholesale Trade Managers, Sports, Recreation and Cultural Centre Managers, Services Managers Not Elsewhere Classified, Physicists and Astronomers, Meteorologists, Chemists, Geologists and Geophysicists, Mathematicians, Actuaries and Statisticians, Biologists, Botanists, Zoologists and Related Professionals, Farming, Forestry and Fisheries Advisers, Environmental Protection Professionals, Industrial and Production Engineers, Civil Engineers, Environmental Engineers, Mechanical Engineers, Chemical Engineers, Mining Engineers, Metallurgists and Related Professionals, Engineering Professionals Not Elsewhere Classified, Electrical Engineers, Electronics Engineers, Telecommunications Engineers, Building Architects, Landscape Architects, Product and Garment Designers, Town and Traffic Planners, Cartographers and Surveyors, Graphic and Multimedia Designers, Generalist Medical Practitioners, Specialist Medical Practitioners, Nursing Professionals, Midwifery Professionals, Traditional and Complementary Medicine Professionals, Paramedical Practitioners, Veterinarians, Dentists, Pharmacists, Environmental and Occupational Health and Hygiene Professionals, Physiotherapists, Dieticians and Nutritionists, Audiologists and Speech Therapists, Optometrists and Ophthalmic Opticians, Health Professionals Not Elsewhere Classified, University and Higher Education Teachers, Vocational Education Teachers, Secondary Education Teachers, Primary School Teachers, Early Childhood Educators, Education Methods specialists, Special Needs Teachers, Other Language Teachers, Other Music Teachers, Other Arts Teachers, Information Technology Trainers, Teaching Professionals Not Elsewhere Classified, Accountants, Financial and Investment Advisers, Financial Analysts, Management and Organization Analysts, Policy Administration Professionals, Personnel and Careers Professionals, Training and Staff Development Professionals, Advertising and Marketing Professionals, Public Relations Professionals, Technical and Medical Sales Professionals (excluding ICT), Information and Communications Technology Sales Professionals, Systems Analysts, Software Developers, Web and Multimedia Developers, Applications Programmers, Software and Applications Developers and Analysts Not Elsewhere Classified, Database Designers and Administrators, Systems Administrators, Computer Network Professionals, Database and Network Professionals Not Elsewhere Classified, Lawyers, Judges, Legal Professionals Not Elsewhere Classified, Archivists and Curators, Librarians and Related Information Professionals, Economists, Sociologists, Anthropologists and Related Professionals, Philosophers, Historians and Political Scientists, Psychologists, Social Work and Counselling Professionals, Religious Professionals, Authors and Related Writers, Journalists, Translators, Interpreters and Other Linguists, Visual Artists, Musicians, Singers and Composers, Dancers and Choreographers, Film, Stage and Related Directors and Producers, Actors, Announcers on Radio, Television and Other Media, Creative and Performing Artists Not Elsewhere Classified, Chemical and Physical Science Technicians, Civil Engineering Technicians, Electrical Engineering Technicians, Electronics Engineering Technicians, Mechanical Engineering Technicians, Chemical Engineering Technicians, Mining and Metallurgical Technicians, Draughtspersons, Physical and Engineering Science Technicians Not Elsewhere Classified, Mining Supervisors, Manufacturing Supervisors, Construction Supervisors, Power Production Plant Operators, Incinerator and Water Treatment Plant Operators, Chemical Processing Plant Controllers, Petroleum and Natural Gas Refining Plant Operators, Metal Production Process Controllers, Process Control Technicians Not Elsewhere Classified, Life Science Technicians (excluding Medical), Agricultural Technicians, Forestry Technicians, Ships Engineers, Ships Deck Officers and Pilots, Aircraft Pilots and Related Associate Professionals, Air Traffic Controllers, Air Traffic Safety Electronics Technicians, Medical Imaging and Therapeutic Equipment Technicians, Medical and Pathology Laboratory Technicians, Pharmaceutical Technicians and Assistants, Medical and Dental Prosthetic Technicians, Nursing Associate Professionals, Midwifery Associate Professionals, Traditional and Complementary Medicine Associate Professionals, Veterinary Technicians and Assistants, Dental Assistants and Therapists, Medical Records and Health Information Technicians, Community Health Workers, Dispensing Opticians, Physiotherapy Technicians and Assistants, Medical Assistants, Environmental and Occupational Health Inspectors and Associates, Ambulance Workers, Health Associate Professionals Not Elsewhere Classified, Securities and Finance Dealers and Brokers, Credit and Loans Officers, Accounting Associate Professionals, Statistical, Mathematical and Related Associate Professionals, Valuers and Loss Assessors, Insurance Representatives, Commercial Sales Representatives, Buyers, Trade Brokers, Clearing and Forwarding Agents, Conference and Event Planners, Employment Agents and Contractors, Real Estate Agents and Property Managers, Business Services Agents Not Elsewhere Classified, Office Supervisors, Legal Secretaries, Administrative and Executive Secretaries, Medical Secretaries, Customs and Border Inspectors, Government Tax and Excise Officials, Government Social Benefits Officials, Government Licensing Officials, Police Inspectors and Detectives, Government Regulatory Associate Professionals Not Elsewhere Classified, Legal and Related Associate Professionals, Social Work Associate Professionals, Religious Associate Professionals, Athletes and Sports Players, Sports Coaches, Instructors and Officials, Fitness and Recreation Instructors and Programme Leaders, Photographers, Interior Designers and Decorators, Gallery, Museum and Library Technicians, Chefs, Other Artistic and Cultural Associate Professionals, Information and Communications Technology Operations Technicians, Information and Communications Technology User Support Technicians, Computer Network and Systems Technicians, Web Technicians, Broadcasting and Audiovisual Technicians, Telecommunications Engineering Technicians, General Office Clerks, Secretaries (general), Typists and Word Processing Operators, Data Entry Clerks, Bank Tellers and Related Clerks, Bookmakers, Croupiers and Related Gaming Workers, Pawnbrokers and Money-lenders, Debt Collectors and Related Workers, Travel Consultants and Clerks, Contact Centre Information Clerks, Telephone Switchboard Operators, Hotel Receptionists, Inquiry Clerks, Receptionists (general), Survey and Market Research Interviewers, Client Information Workers Not Elsewhere Classified, Accounting and Bookkeeping Clerks, Statistical, Finance and Insurance Clerks, Payroll Clerks, Stock Clerks, Production Clerks, Transport Clerks, Library Clerks, Mail Carriers and Sorting Clerks, Coding, Proofreading and Related Clerks, Scribes and Related Workers, Filing and Copying Clerks, Personnel Clerks, Clerical Support Workers Not Elsewhere Classified, Travel Attendants and Travel Stewards, Transport Conductors, Travel Guides, Cooks, Waiters, Bartenders, Hairdressers, Beauticians and Related Workers, Cleaning and Housekeeping Supervisors in Offices, Hotels and Other Establishments, Domestic Housekeepers, Building Caretakers, Astrologers, Fortune-tellers and Related Workers, Companions and Valets, Undertakers and Embalmers, Pet Groomers and Animal Care Workers, Driving Instructors, Personal Services Workers Not Elsewhere Classified, Stall and Market Salespersons, Street Food Salespersons, Shopkeepers, Shop Supervisors, Shop Sales Assistants, Cashiers and Ticket Clerks, Fashion and Other Models, Sales Demonstrators, Door-to-door Salespersons, Contact Centre Salespersons, Service Station Attendants, Food Service Counter Attendants, Sales Workers Not Elsewhere Classified, Child Care Workers, Teachers Aides, Health Care Assistants, Home-based Personal Care Workers, Personal Care Workers in Health Services Not Elsewhere Classified, Firefighters, Police Officers, Prison Guards, Security Guards, Protective Services Workers Not Elsewhere Classified, Field Crop and Vegetable Growers, Tree and Shrub Crop Growers, Gardeners; Horticultural and Nursery Growers, Mixed Crop Growers, Livestock and Dairy Producers, Poultry Producers, Apiarists and Sericulturists, Animal Producers Not Elsewhere Classified, Mixed Crop and Animal Producers, Forestry and Related Workers, Aquaculture Workers, Inland and Coastal Waters Fishery Workers, Deep-sea Fishery Workers, Hunters and Trappers, Subsistence Crop Farmers, Subsistence Livestock Farmers, Subsistence Mixed Crop and Livestock Farmers, Subsistence Fishers, Hunters, Trappers and Gatherers, House Builders, Bricklayers and Related Workers, Stonemasons, Stone Cutters, Splitters and Carvers, Concrete Placers, Concrete Finishers and Related Workers, Carpenters and Joiners, Building Frame and Related Trades Workers Not Elsewhere Classified, Roofers, Floor Layers and Tile Setters, Plasterers, Insulation Workers, Glaziers, Plumbers and Pipe Fitters, Air Conditioning and Refrigeration Mechanics, Painters and Related Workers, Spray Painters and Varnishers, Building Structure Cleaners, Metal Moulders and Coremakers, Welders and Flame Cutters, Sheet Metal Workers, Structural Metal Preparers and Erectors, Riggers and Cable Splicers, Blacksmiths, Hammersmiths and Forging Press Workers, Toolmakers and Related Workers, Metal Working Machine Tool Setters and Operators, Metal Polishers, Wheel Grinders and Tool Sharpeners, Motor Vehicle Mechanics and Repairers, Aircraft Engine Mechanics and Repairers, Agricultural and Industrial Machinery Mechanics and Repairers, Bicycle and Related Repairers, Precision-instrument Makers and Repairers, Musical Instrument Makers and Tuners, Jewellery and Precious Metal Workers, Potters and Related Workers, Glass Makers, Cutters, Grinders and Finishers, Signwriters, Decorative Painters, Engravers and Etchers, Handicraft Workers in Wood, Basketry and Related Materials, Handicraft Workers in Textile, Leather and Related Materials, Handicraft Workers Not Elsewhere Classified, Pre-press Technicians, Printers, Print Finishing and Binding Workers, Building and Related Electricians, Electrical Mechanics and Fitters, Electrical Line Installers and Repairers, Electronics Mechanics and Servicers, Information and Communications Technology Installers and Servicers, Butchers, Fishmongers and Related Food Preparers, Bakers, Pastry-cooks and Confectionery Makers, Dairy Products Makers, Fruit, Vegetable and Related Preservers, Food and Beverage Tasters and Graders, Tobacco Preparers and Tobacco Products Makers, Wood Treaters, Cabinet-makers and Related Workers, Woodworking Machine Tool Setters and Operators, Tailors, Dressmakers, Furriers and Hatters, Garment and Related Patternmakers and Cutters, Sewing, Embroidery and Related Workers, Upholsterers and Related Workers, Pelt Dressers, Tanners and Fellmongers, Shoemakers and Related Workers, Underwater Divers, Shotfirers and Blasters, Product Graders and Testers (excluding Foods and Beverages), Fumigators and Other Pest and Weed Controllers, Craft and Related Workers Not Elsewhere Classified, Miners and Quarriers, Mineral and Stone Processing Plant Operators, Well Drillers and Borers and Related Workers, Cement, Stone and Other Mineral Products Machine Operators, Metal Processing Plant Operators, Metal Finishing, Plating and Coating Machine Operators, Chemical Products Plant and Machine Operators, Photographic Products Machine Operators, Rubber Products Machine Operators, Plastic Products Machine Operators, Paper Products Machine Operators, Fibre Preparing, Spinning and Winding Machine Operators, Weaving and Knitting Machine Operators, Sewing Machine Operators, Bleaching, Dyeing and Fabric Cleaning Machine Operators, Fur and Leather Preparing Machine Operators, Shoemaking and Related Machine Operators, Laundry Machine Operators, Textile, Fur and Leather Products Machine Operators Not Elsewhere Classified, Food and Related Products Machine Operators, Pulp and Papermaking Plant Operators, Wood Processing Plant Operators, Glass and Ceramics Plant Operators, Steam Engine and Boiler Operators, Packing, Bottling and Labelling Machine Operators, Stationary Plant and Machine Operators Not Elsewhere Classified, Mechanical Machinery Assemblers, Electrical and Electronic Equipment Assemblers, Assemblers Not Elsewhere Classified, Locomotive Engine Drivers, Railway Brake, Signal and Switch Operators, Motorcycle Drivers, Car, Taxi and Van Drivers, Bus and Tram Drivers, Heavy Truck and Lorry Drivers, Mobile Farm and Forestry Plant Operators, Earthmoving and Related Plant Operators, Crane, Hoist and Related Plant Operators, Lifting Truck Operators, Ships Deck Crews and Related Workers, Domestic Cleaners and Helpers, Cleaners and Helpers in Offices, Hotels and Other Establishments, Hand Launderers and Pressers, Vehicle Cleaners, Window Cleaners, Other Cleaning Workers, Crop Farm Labourers, Livestock Farm Labourers, Mixed Crop and Livestock Farm Labourers, Garden and Horticultural Labourers, Forestry Labourers, Fishery and Aquaculture Labourers, Mining and Quarrying Labourers, Civil Engineering Labourers, Building Construction Labourers, Hand Packers, Manufacturing Labourers Not Elsewhere Classified, Hand and Pedal Vehicle Drivers, Drivers of Animal-drawn Vehicles and Machinery, Freight Handlers, Shelf Fillers, Fast Food Preparers, Kitchen Helpers, Street and Related Services Workers, Street Vendors (excluding Food), Garbage and Recycling Collectors, Refuse Sorters, Sweepers and Related Labourers, Messengers, Package Deliverers and Luggage Porters, Odd-job Persons, Meter Readers and Vending-machine Collectors, Water and Firewood Collectors, Elementary Workers Not Elsewhere Classified, Commissioned Armed Forces Officers, Non-commissioned Armed Forces Officers, Armed Forces Occupations, Other Ranks

205K 203K
0.99
1K
0.01

46
col_vals_in_set

All `occupation_iscocode` values are valid

col_vals_in_set()

&marker;occupation_iscocode

1111, 1112, 1113, 1114, 1120, 1211, 1212, 1213, 1219, 1221, 1222, 1223, 1311, 1312, 1321, 1322, 1323, 1324, 1330, 1341, 1342, 1343, 1344, 1345, 1346, 1349, 1411, 1412, 1420, 1431, 1439, 2111, 2112, 2113, 2114, 2120, 2131, 2132, 2133, 2141, 2142, 2143, 2144, 2145, 2146, 2149, 2151, 2152, 2153, 2161, 2162, 2163, 2164, 2165, 2166, 2211, 2212, 2221, 2222, 2230, 2240, 2250, 2261, 2262, 2263, 2264, 2265, 2266, 2267, 2269, 2310, 2320, 2330, 2341, 2342, 2351, 2352, 2353, 2354, 2355, 2356, 2359, 2411, 2412, 2413, 2421, 2422, 2423, 2424, 2431, 2432, 2433, 2434, 2511, 2512, 2513, 2514, 2519, 2521, 2522, 2523, 2529, 2611, 2612, 2619, 2621, 2622, 2631, 2632, 2633, 2634, 2635, 2636, 2641, 2642, 2643, 2651, 2652, 2653, 2654, 2655, 2656, 2659, 3111, 3112, 3113, 3114, 3115, 3116, 3117, 3118, 3119, 3121, 3122, 3123, 3131, 3132, 3133, 3134, 3135, 3139, 3141, 3142, 3143, 3151, 3152, 3153, 3154, 3155, 3211, 3212, 3213, 3214, 3221, 3222, 3230, 3240, 3251, 3252, 3253, 3254, 3255, 3256, 3257, 3258, 3259, 3311, 3312, 3313, 3314, 3315, 3321, 3322, 3323, 3324, 3331, 3332, 3333, 3334, 3339, 3341, 3342, 3343, 3344, 3351, 3352, 3353, 3354, 3355, 3359, 3411, 3412, 3413, 3421, 3422, 3423, 3431, 3432, 3433, 3434, 3435, 3511, 3512, 3513, 3514, 3521, 3522, 4110, 4120, 4131, 4132, 4211, 4212, 4213, 4214, 4221, 4222, 4223, 4224, 4225, 4226, 4227, 4229, 4311, 4312, 4313, 4321, 4322, 4323, 4411, 4412, 4413, 4414, 4415, 4416, 4419, 5111, 5112, 5113, 5120, 5131, 5132, 5141, 5142, 5151, 5152, 5153, 5161, 5162, 5163, 5164, 5165, 5169, 5211, 5212, 5221, 5222, 5223, 5230, 5241, 5242, 5243, 5244, 5245, 5246, 5249, 5311, 5312, 5321, 5322, 5329, 5411, 5412, 5413, 5414, 5419, 6111, 6112, 6113, 6114, 6121, 6122, 6123, 6129, 6130, 6210, 6221, 6222, 6223, 6224, 6310, 6320, 6330, 6340, 7111, 7112, 7113, 7114, 7115, 7119, 7121, 7122, 7123, 7124, 7125, 7126, 7127, 7131, 7132, 7133, 7211, 7212, 7213, 7214, 7215, 7221, 7222, 7223, 7224, 7231, 7232, 7233, 7234, 7311, 7312, 7313, 7314, 7315, 7316, 7317, 7318, 7319, 7321, 7322, 7323, 7411, 7412, 7413, 7421, 7422, 7511, 7512, 7513, 7514, 7515, 7516, 7521, 7522, 7523, 7531, 7532, 7533, 7534, 7535, 7536, 7541, 7542, 7543, 7544, 7549, 8111, 8112, 8113, 8114, 8121, 8122, 8131, 8132, 8141, 8142, 8143, 8151, 8152, 8153, 8154, 8155, 8156, 8157, 8159, 8160, 8171, 8172, 8181, 8182, 8183, 8189, 8211, 8212, 8219, 8311, 8312, 8321, 8322, 8331, 8332, 8341, 8342, 8343, 8344, 8350, 9111, 9112, 9121, 9122, 9123, 9129, 9211, 9212, 9213, 9214, 9215, 9216, 9311, 9312, 9313, 9321, 9329, 9331, 9332, 9333, 9334, 9411, 9412, 9510, 9520, 9611, 9612, 9613, 9621, 9622, 9623, 9624, 9629, 0110, 0210, 0310

205K 205K
1.00
0
0.00
2025-11-15 14:15:27 UTC 2.8 s 2025-11-15 14:15:29 UTC

Quality Control for the Worker Module

Worker-level harmonization requires the presence and validity of demographic, identification, and reference date information. The function qualitycheck_worker() applies minimal but essential validation steps to ensure basic integrity of the worker table.

Key Checks Performed

  • Required variables exist: Including worker_id, ref_date, birth_date, gender, educat7, tribe, race, and status.

  • Uniqueness: Worker IDs must be unique within each reference date (ref_date).

  • Non-missingness: Required fields cannot be NA.

  • Birthdate validation: Birth dates must fall between 1900-01-01 and 2000-01-01, which reduces data entry anomalies and impossible DOBs.

Please see the check below:

qualitycheck_worker(bra_hrmis_worker |> as_tibble())
Pointblank Validation
Quality check for Worker Module
tibble worker_tblWARN 1 STOP NOTIFY
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W S N EXT

1
col_exists

All required columns are present

col_exists()

&marker;tribe

1 0
0.00
1
1.00

2
rows_distinct

Worker ID is unique.

rows_distinct()

&marker;worker_id, &marker;ref_date

339K 319K
0.94
20K
0.06

3
col_vals_not_null

Values are not missing.

col_vals_not_null()

&marker;tribe

💥

4
col_vals_between
 col_vals_between()

&marker;birth_date

Date of birth is valid, -25567

💥
2025-11-15 14:15:38 UTC 3.3 s 2025-11-15 14:15:41 UTC

Quality Control for the Organization Module

The organization module captures ministry/department/agency identifiers, names, hierarchical parent-child relationships of public sector organizations, and geographic attributes. The function qualitycheck_orgmod() validates structural and coding integrity.

Key Checks Performed

  • Required variables exist: org_id, org_name_native, org_parent, org_child, country_code, country_name: adm1_name, adm1_code, English name (org_name_en)

  • Row uniqueness: Each org_id must be unique.

  • Non-missingness: All required variables must be populated.

  • Type validation: All organization fields must be character.

  • ISO3 Validation: Ensures country_code belongs to the official World Bank/ISO list using the countrycode package.

Please see the check below:

qualitycheck_orgmod(bra_hrmis_org |> as_tibble())
Pointblank Validation
QCheck for Organization Module
tibble org_tblWARN 1 STOP NOTIFY
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W S N EXT

1
col_exists

All required variables were harmonized

col_exists()

&marker;org_name_native

1 1
1.00
0
0.00

2
col_exists

All required variables were harmonized

col_exists()

&marker;org_id

1 1
1.00
0
0.00

3
col_exists

All required variables were harmonized

col_exists()

&marker;country_code

1 1
1.00
0
0.00

4
col_exists

All required variables were harmonized

col_exists()

&marker;country_name

1 1
1.00
0
0.00

5
col_exists

All required variables were harmonized

col_exists()

&marker;adm1_name

1 1
1.00
0
0.00

6
col_exists

All required variables were harmonized

col_exists()

&marker;adm1_code

1 1
1.00
0
0.00

7
col_exists

All required variables were harmonized

col_exists()

&marker;org_parent

1 1
1.00
0
0.00

8
col_exists

All required variables were harmonized

col_exists()

&marker;org_child

1 1
1.00
0
0.00

9
col_exists

All required variables were harmonized

col_exists()

&marker;org_name_en

1 1
1.00
0
0.00

10
rows_distinct

Data is unique at the organization level

rows_distinct()

&marker;org_id

256 6
0.02
250
0.98

11
col_vals_not_null

Column values are not null

col_vals_not_null()

&marker;org_name_native

256 256
1.00
0
0.00

12
col_vals_not_null

Column values are not null

col_vals_not_null()

&marker;org_id

256 256
1.00
0
0.00

13
col_vals_not_null

Column values are not null

col_vals_not_null()

&marker;country_code

256 256
1.00
0
0.00

14
col_vals_not_null

Column values are not null

col_vals_not_null()

&marker;country_name

256 256
1.00
0
0.00

15
col_vals_not_null

Column values are not null

col_vals_not_null()

&marker;adm1_name

256 256
1.00
0
0.00

16
col_vals_not_null

Column values are not null

col_vals_not_null()

&marker;adm1_code

256 256
1.00
0
0.00

17
col_vals_not_null

Column values are not null

col_vals_not_null()

&marker;org_parent

256 0
0.00
256
1.00

18
col_vals_not_null

Column values are not null

col_vals_not_null()

&marker;org_child

256 0
0.00
256
1.00

19
col_vals_not_null

Column values are not null

col_vals_not_null()

&marker;org_name_en

256 256
1.00
0
0.00

20
col_is_character

Character variables are properly type set

col_is_character()

&marker;org_name_native

1 1
1.00
0
0.00

21
col_is_character

Character variables are properly type set

col_is_character()

&marker;org_id

1 1
1.00
0
0.00

22
col_is_character

Character variables are properly type set

col_is_character()

&marker;country_code

1 1
1.00
0
0.00

23
col_is_character

Character variables are properly type set

col_is_character()

&marker;country_name

1 1
1.00
0
0.00

24
col_is_character

Character variables are properly type set

col_is_character()

&marker;adm1_name

1 1
1.00
0
0.00

25
col_is_character

Character variables are properly type set

col_is_character()

&marker;adm1_code

1 1
1.00
0
0.00

26
col_is_character

Character variables are properly type set

col_is_character()

&marker;org_parent

1 0
0.00
1
1.00

27
col_is_character

Character variables are properly type set

col_is_character()

&marker;org_child

1 0
0.00
1
1.00

28
col_is_character

Character variables are properly type set

col_is_character()

&marker;org_name_en

1 1
1.00
0
0.00

29
col_vals_in_set

the country_code variable belongs to the official ISO-3 codes

col_vals_in_set()

&marker;country_code

AFG, ALB, DZA, ASM, AND, AGO, AIA, ATA, ATG, ARG, ARM, ABW, AUS, AUT, AZE, BHS, BHR, BGD, BRB, BLR, BEL, BLZ, BEN, BMU, BTN, BOL, BIH, BWA, BVT, BRA, IOT, VGB, BRN, BGR, BFA, BDI, KHM, CMR, CAN, CPV, BES, CYM, CAF, TCD, CHL, CHN, CXR, CCK, COL, COM, COG, COD, COK, CRI, HRV, CUB, CUW, CYP, CZE, CIV, DNK, DJI, DMA, DOM, ECU, EGY, SLV, GNQ, ERI, EST, SWZ, ETH, FLK, FRO, FJI, FIN, FRA, GUF, PYF, ATF, GAB, GMB, GEO, DEU, GHA, GIB, GRC, GRL, GRD, GLP, GUM, GTM, GGY, GIN, GNB, GUY, HTI, HMD, HND, HKG, HUN, ISL, IND, IDN, IRN, IRQ, IRL, IMN, ISR, ITA, JAM, JPN, JEY, JOR, KAZ, KEN, KIR, KWT, KGZ, LAO, LVA, LBN, LSO, LBR, LBY, LIE, LTU, LUX, MAC, MDG, MWI, MYS, MDV, MLI, MLT, MHL, MTQ, MRT, MUS, MYT, MEX, FSM, MDA, MCO, MNG, MNE, MSR, MAR, MOZ, MMR, NAM, NRU, NPL, NLD, NCL, NZL, NIC, NER, NGA, NIU, NFK, PRK, MKD, MNP, NOR, OMN, PAK, PLW, PSE, PAN, PNG, PRY, PER, PHL, PCN, POL, PRT, PRI, QAT, ROU, RUS, RWA, REU, MAF, WSM, SMR, SAU, SEN, SRB, SYC, SLE, SGP, SXM, SVK, SVN, SLB, SOM, ZAF, SGS, KOR, SSD, ESP, LKA, BLM, SHN, KNA, LCA, SPM, VCT, SDN, SUR, SJM, SWE, CHE, SYR, STP, TWN, TJK, TZA, THA, TLS, TGO, TKL, TON, TTO, TUN, TUR, TKM, TCA, TUV, VIR, UGA, UKR, ARE, GBR, USA, UMI, URY, UZB, VUT, VAT, VEN, VNM, WLF, ESH, YEM, ZMB, ZWE, ALA

256 256
1.00
0
0.00
2025-11-15 14:15:42 UTC < 1 s 2025-11-15 14:15:43 UTC